Week7 day 2 - Spatial Data / spatial analysis
What do you think spatial data looks like? Latitude and longitude
What do you think spatial data visualisation looks like?
Spatial (geospatial data) is data about geographical locations
How is spatial data represented?
Spatial data are special - there are meaning to the points; while encoded as numbers (dbl’s), these numbers actually have a special meaning
Spatial vectors (non-R specific)
Spatial vectors encode spatial data - there are 3 main types
- point (1, 4)
- line ((1,4), (4,5))
- polygon ((2,2), (4,2), (5,3), etc…)
point; train stations, wells, etc.. lines; roads.. polygons; council areas, lakes
Spatial data in R
Sometimes we just get given data frames with lat/long columns
Another option is to load in shapefiles
to do this, we use the package sf - simplefeatures
library(sf)
Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1
a feature = a geometry
north_carolina <- st_read(system.file("shape/nc.shp", package = "sf"))
Reading layer `nc' from data source
`/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library/sf/shape/nc.shp'
using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ─────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.5 ✓ dplyr 1.0.7
✓ tidyr 1.1.4 ✓ stringr 1.4.0
✓ readr 2.0.2 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
north_carolina %>%
as_tibble()
nc_geo <- st_geometry(north_carolina)
nc_geo[[1]]
MULTIPOLYGON (((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359, -81.63306 36.34069, -81.74107 36.39178, -81.69828 36.47178, -81.7028 36.51934, -81.67 36.58965, -81.3453 36.57286, -81.34754 36.53791, -81.32478 36.51368, -81.31332 36.4807, -81.26624 36.43721, -81.26284 36.40504, -81.24069 36.37942, -81.23989 36.36536, -81.26424 36.35241, -81.32899 36.3635, -81.36137 36.35316, -81.36569 36.33905, -81.35413 36.29972, -81.36745 36.2787, -81.40639 36.28505, -81.41233 36.26729, -81.43104 36.26072, -81.45289 36.23959, -81.47276 36.23436)))
class(north_carolina)
[1] "sf" "data.frame"
it’s a dataframe with geometries
Plotting geometries
plot(north_carolina["AREA"])

# first row, "AREA" column
plot(north_carolina[1, "AREA"])

Task - Have a look through some of the variables within your north_carolina dataset, and see if you can create a spatial plot using the techniques above.
plot(north_carolina["FIPS"])

ggplot and sf
ggplot can plot spatial with sf, called geom_sf
north_carolina %>%
ggplot() +
geom_sf(aes(fill = SID74), colour = "black") +
theme_bw()

Task - try plotting another feature for NC. What does it tell you? play around with ggplot aspects (e.g. colour)
north_carolina %>%
ggplot() +
geom_sf(aes(fill = FIPS), colour = "black", show.legend = FALSE) +
theme_void() +
ylim(20, 50)

NA
library(rgeos)
Loading required package: sp
rgeos version: 0.5-8, (SVN revision 679)
GEOS runtime version: 3.9.1-CAPI-1.14.2
Please note that rgeos will be retired by the end of 2023,
plan transition to sf functions using GEOS at your earliest convenience.
GEOS using OverlayNG
Linking to sp version: 1.4-5
Polygon checking: TRUE
library(rnaturalearth)
library(rnaturalearthdata)
Using rnaturalearth package we can import boundaries of countries at various levels
world <- ne_countries(scale = "medium", returnclass = "sf")
world %>%
as_tibble() %>%
head(5)
So we’ve got the data, now we can plot it
world %>%
ggplot() +
geom_sf() +
labs(x = "longitude", y = "latitude", title = "World Map")

We can plot actual data now:
world %>%
ggplot() +
geom_sf(aes(fill = pop_est)) +
scale_fill_viridis_c(trans = "sqrt")

Task: Recap your knowledge from ggplot week, and set your geom_sf aesthetic to be filled with the estimated gdp (gdp_md_est variable). Extra points if you make your map colour blind friendly! What does your plot tell you? What does it tell you compared to the population?
world %>%
ggplot() +
geom_sf(aes(fill = gdp_md_est)) +
scale_fill_continuous(type = "viridis") +
theme_void()

# It tells me that while China and India are greater than the rest of the world
# in terms of population, the US is greater than the China and India in terms
# of GDP (despite having lower population)
With our world data frame, we can filter for specific countries, using the tidyverse
country_italy <- world %>%
filter(name == "Italy")
# plot just italy
country_italy %>%
ggplot() +
geom_sf() +
labs(x = "Longitude",
y = "Latitude",
title = "Italy")

country_denmark <- world %>%
filter(name == "Denmark")
country_denmark %>%
ggplot() +
geom_sf() +
labs(x = "Longitude",
y = "Latitude",
title = "Danmark")

country_faroes <- world %>%
filter(name == "Faeroe Is.")
country_faroes%>%
ggplot() +
geom_sf() +
labs(x = "Longitude",
y = "Latitude",
title = "Færøerne")

Zooming in on particular parts of the world
- we can subset our graph by limiting the x and y range using coord_sf()
world %>%
ggplot() +
geom_sf() +
coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE)

Add informative labels
- we need to tell ggplot where to put the label
- therefore we need to calculate where we want to put the labels (centre of each feature)
- we need to calculate the centres (centroids)
world %>%
mutate(centre = st_centroid(st_make_valid(geometry))) %>%
as_tibble() %>%
select(name, centre)
world_with_centres <- world %>%
mutate(centre = st_centroid(st_make_valid(geometry))) %>%
mutate(lat = st_coordinates(centre)[,1],
long = st_coordinates(centre)[,2])
# centre geometries
# purely illustrative
centre_geo <- world %>%
mutate(centre = st_centroid(st_make_valid(geometry))) %>%
select(centre)
centre_geo[[1]][[1]]
POINT (-69.98266 12.52088)
world_with_centres %>%
ggplot() +
geom_sf() +
geom_text(aes(x = lat, y = long, label = name), colour = "darkblue",
fontface = "bold", check_overlap = TRUE, size = 3) +
coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE)

We can add additional information using annotate
- add one that says: Gulf of Mexico
world_with_centres %>%
ggplot() +
geom_sf() +
geom_text(aes(x = lat, y = long, label = name), colour = "darkblue",
fontface = "bold", check_overlap = TRUE, size = 3) +
annotate(geom = "text", x = -90, y = 26, label = "Gulf of Mexico", size = 5,
fontface = "italic")+
coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE)

Task
world_with_centres %>%
ggplot() +
geom_sf(aes(fill = income_grp)) +
geom_text(aes(x = lat, y = long, label = name), colour = "black",
fontface = "bold", check_overlap = TRUE, size = 3) +
annotate(geom = "text", x = -90, y = 26, label = "Gulf of Mexico", size = 3,
fontface = "italic")+
coord_sf(xlim = c(-110.15, -60.12), ylim = c(2.65, 50.97), expand = TRUE) +
labs(x = "Latitude",
y = "Longitude",
fill = "OECD Income group")

Interactive maps with leaflet
library(leaflet)
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
leaflet() %>%
addTiles() %>% #basemap
addMarkers(lng = 174.768, lat = -36.852, popup = "The birthplace of R")
That was our (maybe) first leaflet
Get some spatial data from the web - turn it into a format R can work with - visualise it using leaflet
library(jsonlite)
Attaching package: ‘jsonlite’
The following object is masked from ‘package:purrr’:
flatten
colorado_data_url <-
"https://data.colorado.gov/resource/j5pc-4t32.json?&county=BOULDER"
head(readLines(colorado_data_url))
Warning in readLines(colorado_data_url) :
incomplete final line found on 'https://data.colorado.gov/resource/j5pc-4t32.json?&county=BOULDER'
[1] "[ {"
[2] " \"station_name\" : \"PECK-PELLA AUGMENTATION RETURN\","
[3] " \"div\" : \"1\","
[4] " \"location\" : {"
[5] " \"latitude\" : \"40.160705\","
[6] " \"needs_recoding\" : false,"
jsonlite package has a lot of functions to help work with json data
json is like a list of lists in R - sometimes we need to recursively sift through to extract the relevant data
colorado_water <- fromJSON(colorado_data_url) %>%
jsonlite::flatten(recursive = TRUE)
# wrangling
colorado_water_clean <- colorado_water %>%
select(-location.needs_recoding) %>%
mutate(across(c(starts_with("location"), amount), as.numeric)) %>%
filter(!is.na(location.latitude), !is.na(location.longitude))
# visualise with leaflet
colorado_water_clean %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(lng = ~location.longitude,
lat = ~location.latitude)
- incidences of surface water in and around Boulder, Colorado
colorado_water_clean %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(lng = ~location.longitude,
lat = ~location.latitude,
radius = ~log(amount), weight = 1)
Warning in log(amount) : NaNs produced
Clustering
Let’s have a look at what addMarkers looks like if we have lots of points
colorado_water_clean %>%
leaflet() %>%
addTiles() %>%
addMarkers(lng = ~location.longitude,
lat = ~location.latitude)
We can add clustering options
colorado_water_clean %>%
leaflet() %>%
addTiles() %>%
addMarkers(lng = ~location.longitude,
lat = ~location.latitude,
clusterOptions = markerClusterOptions())
Leaflet in shiny
maybe too many? allow user to filter for specific regions only view distilleries in that region
whisky_df %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(lat = ~lat, lng = ~long, popup = ~Distillery)
---
title: "R Notebook"
output: html_notebook
---

Week7 day 2 - Spatial Data / spatial analysis

What do you think spatial data looks like?
  Latitude and longitude
  
  
What do you think spatial data visualisation looks like?
  
Spatial (geospatial data) is data about geographical locations

## How is spatial data represented?


Spatial data are special - there are meaning to the points; while encoded
as numbers (dbl's), these numbers actually have a special meaning

Spatial vectors (non-R specific)

Spatial vectors encode spatial data - there are 3 main types

- point (1, 4)
- line ((1,4), (4,5))
- polygon ((2,2), (4,2), (5,3), etc...)

point; train stations, wells, etc..
lines; roads..
polygons; council areas, lakes

## Spatial data in R

Sometimes we just get given data frames with lat/long columns

Another option is to load in shapefiles

to do this, we use the package sf - simplefeatures

```{r}
library(sf)
```

a feature = a geometry

```{r}
north_carolina <- st_read(system.file("shape/nc.shp", package = "sf"))
```

```{r}
library(tidyverse)
```


```{r}
north_carolina %>%
  as_tibble()
```

```{r}
nc_geo <- st_geometry(north_carolina)

nc_geo[[1]]
```

```{r}
class(north_carolina)
```
it's a dataframe with geometries


## Plotting geometries


```{r}
plot(north_carolina["AREA"])
```

```{r}
# first row, "AREA" column
plot(north_carolina[1, "AREA"])
```

Task - Have a look through some of the variables within your north_carolina dataset, and see if you can create a spatial plot using the techniques above.
```{r}
plot(north_carolina["FIPS"])
```
## ggplot and sf

ggplot can plot spatial with sf, called geom_sf

```{r}
north_carolina %>%
  ggplot() +
  geom_sf(aes(fill = SID74), colour = "black") +
  theme_bw()
```

Task - try plotting another feature for NC. What does it tell you? play around
with ggplot aspects (e.g. colour)
```{r}
north_carolina %>%
  ggplot() +
  geom_sf(aes(fill = FIPS), colour = "black", show.legend = FALSE) +
  theme_void() +
  ylim(20, 50)
  
```

```{r}
library(rgeos)
library(rnaturalearth)
library(rnaturalearthdata)
```

Using rnaturalearth package we can import boundaries of countries at various
levels

```{r}
world <- ne_countries(scale = "medium", returnclass = "sf")
```

```{r}
world %>%
  as_tibble() %>%
  head(5)
```

So we've got the data, now we can plot it

```{r}
world %>%
  ggplot() +
  geom_sf() +
  labs(x = "longitude", y = "latitude", title = "World Map")
```

We can plot actual data now: 

```{r}
world %>%
  ggplot() +
  geom_sf(aes(fill = pop_est)) +
  scale_fill_viridis_c(trans = "sqrt")
```

Task: Recap your knowledge from ggplot week, and set your geom_sf aesthetic to be filled with the estimated gdp (gdp_md_est variable). Extra points if you make your map colour blind friendly!
What does your plot tell you? What does it tell you compared to the population?

```{r}
world %>%
  ggplot() +
  geom_sf(aes(fill = gdp_md_est)) +
  scale_fill_continuous(type = "viridis") +
  theme_void()

# It tells me that while China and India are greater than the rest of the world
# in terms of population, the US is greater than the China and India in terms
# of GDP (despite having lower population)
```

With our world data frame, we can filter for specific countries, using the 
tidyverse

```{r}
country_italy <- world %>%
  filter(name == "Italy")
```

```{r}
# plot just italy

country_italy %>%
  ggplot() +
  geom_sf() +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Italy")
```


```{r}
country_denmark <- world %>%
  filter(name == "Denmark")

country_denmark %>%
  ggplot() +
  geom_sf() +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Danmark")

country_faroes <- world %>%
  filter(name == "Faeroe Is.")

country_faroes%>%
  ggplot() +
  geom_sf() +
  labs(x = "Longitude",
       y = "Latitude",
       title = "Færøerne")

```

## Zooming in on particular parts of the world

- we can subset our graph by limiting the x and y range using coord_sf()

```{r}
world %>%
  ggplot() +
  geom_sf() +
  coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE)
```

Add informative labels

- we need to tell ggplot where to put the label
- therefore we need to calculate where we want to put the labels (centre of 
each feature)
- we need to calculate the centres (centroids)

```{r}
world %>%
  mutate(centre = st_centroid(st_make_valid(geometry))) %>%
  as_tibble() %>%
  select(name, centre)
```

```{r}
world_with_centres <- world %>%
  mutate(centre = st_centroid(st_make_valid(geometry))) %>%
  mutate(lat = st_coordinates(centre)[,1],
         long = st_coordinates(centre)[,2])
```


```{r}
# centre geometries
# purely illustrative

centre_geo <- world %>%
  mutate(centre = st_centroid(st_make_valid(geometry))) %>%
  select(centre)
```


```{r}
centre_geo[[1]][[1]]
```


```{r}
world_with_centres %>%
  ggplot() +
  geom_sf() +
  geom_text(aes(x = lat, y = long, label = name), colour = "darkblue",
            fontface = "bold", check_overlap = TRUE, size = 3) +
  coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE)
```

We can add additional information using annotate 

- add one that says: Gulf of Mexico

```{r}
world_with_centres %>%
  ggplot() +
  geom_sf() +
  geom_text(aes(x = lat, y = long, label = name), colour = "darkblue",
            fontface = "bold", check_overlap = TRUE, size = 3) +
  annotate(geom = "text", x = -90, y = 26, label = "Gulf of Mexico", size = 5,
           fontface = "italic")+
  coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE) 
```

Task
```{r}
world_with_centres %>%
  ggplot() +
  geom_sf(aes(fill = income_grp)) +
  geom_text(aes(x = lat, y = long, label = name), colour = "black",
            fontface = "bold", check_overlap = TRUE, size = 3) +
  annotate(geom = "text", x = -90, y = 26, label = "Gulf of Mexico", size = 3,
           fontface = "italic")+
  coord_sf(xlim = c(-110.15, -60.12), ylim = c(2.65, 50.97), expand = TRUE) +
  labs(x = "Latitude",
       y = "Longitude",
       fill = "OECD Income group")
```

## Interactive maps with leaflet

```{r}
library(leaflet)
```


```{r}
leaflet() %>%
  addTiles() %>% #basemap
  addMarkers(lng = 174.768, lat = -36.852, popup = "The birthplace of R")
```

That was our (maybe) first leaflet

Get some spatial data from the web
- turn it into a format R can work with
- visualise it using leaflet

```{r}
library(jsonlite)
```

```{r}
colorado_data_url <-
  "https://data.colorado.gov/resource/j5pc-4t32.json?&county=BOULDER"
```

```{r}
head(readLines(colorado_data_url))
```

jsonlite package has a lot of functions to help work with json data

json is like a list of lists in R - sometimes we need to recursively sift through
to extract the relevant data

```{r}
colorado_water <- fromJSON(colorado_data_url) %>%
  jsonlite::flatten(recursive = TRUE)
```

```{r}
# wrangling
colorado_water_clean <- colorado_water %>%
  select(-location.needs_recoding) %>%
  mutate(across(c(starts_with("location"), amount), as.numeric)) %>%
  filter(!is.na(location.latitude), !is.na(location.longitude))
  
```

```{r}
# visualise with leaflet

colorado_water_clean %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers(lng = ~location.longitude,
                   lat = ~location.latitude)
```

- incidences of surface water in and around Boulder, Colorado

```{r}
colorado_water_clean %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers(lng = ~location.longitude,
                   lat = ~location.latitude,
                   radius = ~log(amount), weight = 1)
```

## Clustering

Let's have a look at what addMarkers looks like if we have lots of points

```{r}
colorado_water_clean %>%
  leaflet() %>%
  addTiles() %>%
  addMarkers(lng = ~location.longitude,
                   lat = ~location.latitude)
```

We can add clustering options

```{r}
colorado_water_clean %>%
  leaflet() %>%
  addTiles() %>%
  addMarkers(lng = ~location.longitude,
                   lat = ~location.latitude,
             clusterOptions = markerClusterOptions())
```

## Leaflet in shiny

```{r}
whisky_df <- CodeClanData::whisky %>%
  rename(long = Latitude,
         lat = Longitude)
```

maybe too many?
allow user to filter for specific regions
only view distilleries in that region

```{r}
whisky_df %>%
  leaflet() %>%
  addTiles() %>%
  addCircleMarkers(lat = ~lat, lng = ~long, popup = ~Distillery)
```

